Take-Home Exercise 1: Geospatial analytics for Social Good - Myanmar Arm Conflict Case Study
Author
Han Ming Yan
Published
September 2, 2024
Modified
September 23, 2024
Abstract
In this study, you are tasked to apply spatial point patterns analysis methods to discover the spatial and spatio-temporal distribution of armed conflict in Myanmar.
Keywords
Spatial Analysis, Point Patterns, First-Order Analysis, Second-Order Analysis, Monte Carlo simulation, Myanmar, Myanmar Civil War Crisis
1Overview
This exercise focuses on applying geospatial analytics to explore and analyze the impact of the armed conflict in Myanmar. The aim is to use spatial data and analytical techniques to better understand the conflict’s dynamics, identify affected regions, and assess the humanitarian implications. This case study offers a real-world application of geospatial tools for social good, demonstrating how data-driven insights can inform decision-making in crisis situations.
1.1Data used
For this study, I used the armed conflict data for Myanmar from January 2021 to June 2024, sourced from Armed Conflict Location & Event Data (ACLED). I focused on four primary event types related to conflict:
The specific tasks of this take-home exercise are as follows:
Using appropriate function of sf and tidyverse packages, import and transform the downloaded armed conflict data and administrative boundary data into sf tibble data.frames.
Using the geospatial data sets prepared, derive quarterly KDE layers.
Using the geospatial data sets prepared, perform 2nd-Order Spatial Point Patterns Analysis.
Using the geospatial data sets prepared, derive quarterly spatio-temporal KDE layers.
Using the geospatial data sets prepared, perform 2nd-Order Spatio-temporal Point Patterns Analysis.
Using appropriate tmap functions, display the KDE and Spatio-temporal KDE layers on openstreetmap of Myanmar.
Describe the spatial patterns revealed by the KDE and Spatio-temporal KDE maps.
Geospatial Data Wrangling (20 marks): This is an important aspect of geospatial analytics. You will be assessed on your ability to employ appropriate R functions from various R packages specifically designed for modern data science such as readxl, tidyverse (tidyr, dplyr, ggplot2), sf just to mention a few of them, to perform the entire geospatial data wrangling processes, including. This is not limited to data import, data extraction, data cleaning and data transformation. Besides assessing your ability to use the R functions, this criterion also includes your ability to clean and derive appropriate variables to meet the analysis need.
Geospatial Analysis (30 marks): In this exercise, you are expected to use the appropriate spatial point patterns analysis methods and R packages introduced in class to analysis the geospatial data prepared. You will be assessed on your ability to derive analytical products by using appropriate kernel estimation techniques.
Geovisualisation and geocommunication (20 marks): In this section, you will be assessed on your ability to communicate Exploratory Spatial Data Analysis and Confirmatory Spatial Data Analysis results in layman friendly visual representations. This course is geospatial centric, hence, it is important for you to demonstrate your competency in using appropriate geovisualisation techniques to reveal and communicate the findings of your analysis.
Reproducibility (15 marks): This is an important learning outcome of this exercise. You will be assessed on your ability to provide a comprehensive documentation of the analysis procedures in the form of code chunks of Quarto. It is important to note that it is not enough by merely providing the code chunk without any explanation on the purpose and R function(s) used.
Bonus (15 marks): Demonstrate your ability to employ methods beyond what you had learned in class to gain insights from the data.
1.2Installing and Loading the R packages
Before getting into the analysis, I need to load specific R packages that provide essential functions for handling spatial and point pattern data. Packages like sf, spatstat, and sparr offer tools for working with spatial features and conducting point pattern analysis, including kernel density estimation (KDE) and second-order spatial analysis. In total, five R packages will be used, they are:
sf, a relatively new R package specially designed to import, manage and process vector-based geospatial data in R.
spatstat, which has a wide range of useful functions for point pattern analysis. In this hands-on exercise, it will be used to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation (KDE) layer.
sparr, provides functions to estimate fixed and adaptive kernel
raster which reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster). In this hands-on exercise, it will be used to convert image output generate by spatstat into raster format.
maptools which provides a set of tools for manipulating geographic data. In this hands-on exercise, we mainly use it to convert Spatial objects into ppp format of spatstat. (Deprecated as of October 2023)
tmap which provides functions for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API.
2 Installing maptools
maptools is retired as of October 2023 and binary is removed from CRAN. However, we can still download from Posit Public Package Manager snapshots by using the code chunk specified
#|eval: false # it will no longer run anymore after first run#|warning: falseinstall.packages("maptools", repos ="https://packagemanager.posit.co/cran/2023-10-13")
Installing package into 'C:/Users/mngyn/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'maptools' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\mngyn\AppData\Local\Temp\RtmpI1udd8\downloaded_packages
Use the code chunk below to install and launch the five R packages.
I then import the armed conflict data (ACLED_Myanmar.csv) and the KML files containing Myanmar’s administrative boundaries. The csv file contains the conflict event data, including latitude, longitude, and the type of event (e.g., battles or explosions). I used the readr::read_csv() function to load the CSV data and convert it into an sf object, which allows me to handle the spatial features (like coordinates) directly in R.
Rows: 51553 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (20): event_id_cnty, event_date, disorder_type, event_type, sub_event_ty...
dbl (12): year, time_precision, inter1, inter2, interaction, iso, latitude, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Next, I imported the KML files to give me Myanmar’s administrative boundaries at different levels (e.g., provinces, districts). These boundaries will help contextualize the spatial patterns of conflict events within the country’s political borders.
In this step, I examined the column names of the imported dataset to better understand the variables available and decide which ones were relevant to my analysis. I noticed that variables such as country, region, and ISO codes were redundant since all data points are specific to Myanmar. I removed these columns to simplify the dataset and reduce unnecessary clutter.
Simple feature collection with 21309 features and 26 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -207135 ymin: 1103500 xmax: 604775.1 ymax: 3027042
Projected CRS: WGS 84 / UTM zone 47N
# A tibble: 21,309 × 27
event_id_cnty event_date year time_precision disorder_type event_type
* <chr> <date> <dbl> <dbl> <chr> <chr>
1 MMR64313 2024-06-30 2024 1 Political violence Battles
2 MMR64320 2024-06-30 2024 1 Political violence Battles
3 MMR64321 2024-06-30 2024 1 Political violence Battles
4 MMR64323 2024-06-30 2024 1 Political violence Battles
5 MMR64325 2024-06-30 2024 1 Political violence Battles
6 MMR64326 2024-06-30 2024 1 Political violence Battles
7 MMR64328 2024-06-30 2024 1 Political violence Battles
8 MMR64330 2024-06-30 2024 1 Political violence Battles
9 MMR64331 2024-06-30 2024 1 Political violence Battles
10 MMR64332 2024-06-30 2024 1 Political violence Battles
# ℹ 21,299 more rows
# ℹ 21 more variables: sub_event_type <chr>, actor1 <chr>, assoc_actor_1 <chr>,
# inter1 <dbl>, actor2 <chr>, assoc_actor_2 <chr>, inter2 <dbl>,
# interaction <dbl>, civilian_targeting <chr>, admin1 <chr>, admin2 <chr>,
# admin3 <chr>, location <chr>, geo_precision <dbl>, source <chr>,
# source_scale <chr>, fatalities <dbl>, tags <chr>, timestamp <dbl>,
# population_best <dbl>, geometry <POINT [m]>
tm_shape(adm2_polygons) +tm_borders(col ="green") +tm_shape(adm3_polygons) +tm_borders(col ="black") +tm_shape(adm0) +tm_dots(col ="blue", size =0.1) +tm_shape(acled_sf) +tm_dots(col ="red", size =0.1) # Plot the conflict data points in red
2.2.1 Save data into the rds folder
We do this so that we do not need to re-run the previous codes, to keep the derived data in a .rds file
# | eval: falsewrite_rds(acled_sf, "data/rds/acled_sf.rds") # keep in tibble format
2.2.2 To read back the data, (if we do not have it in our environment yet)
2.3Quarterly Data Segmentation
Since the objective is to study the temporal patterns of armed conflict over time, I segmented the dataset by quarters (Q1, Q2, Q3, and Q4) for each year. This approach makes it easier to analyze changes in conflict dynamics throughout the years.
# Convert event_date to quarters for quarterly analysis, shift quarter to 4th column for easier referencemyanmar_sf <- acled_sf %>%mutate(quarter =paste0(year(event_date), " Q", quarter(event_date))) %>% dplyr::select(1:3, quarter, everything())
I further segmented the data based on conflict event types (battles, explosions, strategic developments, and violence against civilians) for each quarter. By doing this, I can isolate different types of conflict events and analyze their distribution over time.
# Get all quarters from the dataunique_quarters <-unique(myanmar_sf$quarter)event_types <-c("Battles"="b","Strategic developments"="sd","Violence against civilians"="vac","Explosions/Remote violence"="erv")for (q in unique_quarters) { df_quarter <- myanmar_sf %>%filter(quarter == q)for (event innames(event_types)) { df_event <- df_quarter %>%filter(event_type == event)# for debugging:#print(paste0("Processing quarter: ", q, ", event: ", event, " - Rows: ", nrow(df_event))) df_name <-paste0("myanmar_sf_", gsub(" ", "_", q), "_", event_types[[event]])assign(df_name, df_event) }}ls(pattern ="myanmar_sf_")
Since I have split the data frames into their respective year, quarter and categories, i will remove the original quarter only data frames and also remove all the empty data frames (0 rows.)
old_quarter_dfs <-ls(pattern ="^myanmar_sf_\\d{4}_Q\\d$")# Remove the old quarterly data framesrm(list = old_quarter_dfs)all_dfs <-ls(pattern ="^myanmar_sf_")# Loop through each data frame and remove those with 0 rowsfor (df_name in all_dfs) { df <-get(df_name)if (nrow(df) ==0) {rm(list = df_name) }}myanmar_sfs <-ls(pattern ="myanmar_sf_")
2.4Converting myanmar_sf frame to sp’s Spatial* class
since we have 14 quarters to convert, instead of converting all one by one, i will use a for-each loop to convert them in one code chunk
myanmar_sfs <-ls(pattern ="^myanmar_sf_")# Loop through each of these data frames and convert to Spatial objectsfor (df_name in myanmar_sfs) { df <-get(df_name) sp_object <-as_Spatial(df) sp_name <-gsub("sf", "sp", df_name)assign(sp_name, sp_object)}myanmar_sps <-ls(pattern ="myanmar_sp_")myanmar_sps
2.6Converting the generic spatial format into spatstat’s ppp format
Next, I will convert the spatial data frames into ppp (planar point pattern) objects. This step is crucial because ppp objects are required for performing point pattern analyses like kernel density estimation and second-order analyses in spatstat. A ppp object holds the coordinates of events, along with optional window boundaries and marks (such as event type or time).
for (df_name in myanmar_sfs) { df <-get(df_name) ppp_object <-as.ppp(df) ppp_name <-gsub("sf", "ppp", df_name)assign(ppp_name, ppp_object)}
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Warning in as.ppp.sf(df): only first attribute column is used for marks
Marked planar point pattern: 1047 points
marks are of storage type 'character'
window: rectangle = [-191409.1, 518300.4] x [1132472.1, 2936770.4] units
plot(myanmar_ppp_2023_Q3_b)
Warning in default.charmap(ntypes, chars): Too many types to display every type
as a different character
Warning: Only 10 out of 1047 symbols are shown in the symbol map
summary(myanmar_ppp_2023_Q3_b)
Marked planar point pattern: 1047 points
Average intensity 8.176317e-10 points per square unit
Coordinates are given to 13 decimal places
marks are of type 'character'
Summary:
Length Class Mode
1047 character character
Window: rectangle = [-191409.1, 518300.4] x [1132472.1, 2936770.4] units
(709700 x 1804000 units)
Window area = 1.28053e+12 square units
any(duplicated(myanmar_ppp_2023_Q3_b))
[1] FALSE
sum(multiplicity(myanmar_ppp_2023_Q3_b) >1)
[1] 0
2.7Creating owin objects
I also created corresponding owin (window) objects, which define the boundaries of the region under analysis. These owin objects allow me to confine my spatial analysis to the boundaries of Myanmar.
for (df_name in myanmar_sfs) { ppp_name <-gsub("sf", "ppp", df_name) ppp_object <-get(ppp_name) owin_object <-as.owin(ppp_object) owin_name <-gsub("sf", "owin", df_name)assign(owin_name, owin_object)}ls(pattern ="myanmar_owin_")
Window: rectangle = [-206931.7, 572438.5] x [1359954.3, 3026504.9] units
(779400 x 1667000 units)
Window area = 1.29886e+12 square units
3 Second-order Spatial Point Patterns Analysis
3.0.1G-Function Analysis
The G-function measures the distribution of the distances from an arbitrary event to its nearest event.
# Loop through each quarterly ppp object to compute G-functionfor (ppp_name in myanmar_ppps) {# Get the ppp object ppp_object <-get(ppp_name)# Compute G-function estimation G <-Gest(ppp_object, correction ="border")# Plot the G-functionplot(G, xlim =c(0, 500), main =paste("G-Function for", ppp_name))# Perform Monte Carlo test with G-function G_csr <-envelope(ppp_object, Gest, nsim =999)# Plot the Monte Carlo test resultsplot(G_csr, main =paste("CSR Test - G-Function for", ppp_name))}
The F-function estimates the empty space function F(r) from a point pattern in an arbitrary window shape.
# Loop through each quarterly ppp object to compute F-functionfor (ppp_name in myanmar_ppps) {# Get the ppp object ppp_object <-get(ppp_name)# Compute F-function estimation F <-Fest(ppp_object)# Plot the F-functionplot(F, main =paste("F-Function for", ppp_name)) F_csr <-envelope(ppp_object, Fest, nsim =999)plot(F_csr, main =paste("CSR Test - F-Function for", ppp_name))}
The L-function is a linearized version of the K-function, making interpretation easier.
# Loop through each quarterly ppp object to compute L-functionfor (ppp_name in myanmar_ppps) {# Get the ppp object ppp_object <-get(ppp_name) G <-Gest(ppp_object, correction ="border")plot(G, xlim =c(0, 500), main =paste("G-Function for", ppp_name)) G_csr <-envelope(ppp_object, Gest, nsim =999)plot(G_csr, main =paste("CSR Test - G-Function for", ppp_name))}
Using myanmar_ppp_2021_Q1_Strategic_developments to show case the sigma values for the automatic Bandwidth methods.
bw.CvL(myanmar_ppp_2021_Q1_sd)
sigma
235083.6
bw.scott(myanmar_ppp_2021_Q1_sd)
sigma.x sigma.y
49839.57 121971.78
bw.ppl(myanmar_ppp_2021_Q1_sd)
sigma
15557.15
bw.diggle(myanmar_ppp_2021_Q1_sd)
sigma
191.6059
Function
Methodology
Advantages
Limitations
bw.CvL
Least Squares Cross-Validation
Data-driven, adaptive
Computationally intensive, sensitive to outliers
bw.scott
Scott’s Rule of Thumb
Simple, quick
Assumes Gaussian distribution, non-adaptive
bw.ppl
Penalized Profile Likelihood
Balances fit and smoothness, flexible
Complex, requires penalty parameter tuning
bw.diggle
Diggle’s Method (Edge-corrected, adaptive)
Tailored for spatial data, handles edge effects
More specific, potentially computationally demanding
I have explored different automatic bandwidth selection methods for kernel density estimation, applied to my spatio-temporal data on strategic developments in Myanmar. Here’s what each method represents and how I interpreted the results:
bw.CvL (Cross-Validation for Likelihood):
Sigma: 235083.6
Explanation: This method finds the bandwidth by minimizing prediction error, balancing overfitting and underfitting. It’s data-driven but also computationally more intensive. The large sigma value of 235083.6 means a high level of smoothing, where smaller clusters of points are less likely to be mentioned. In my case, this would give a broader, more generalized view of the conflict hotspots.
bw.scott (Scott’s Rule of Thumb):
Sigma.x: 49839.57, Sigma.y: 121971.78
Explanation: The Scott’s method provides bandwidth estimates based on a rule of thumb, with different values for the x and y directions. It compute faster than cross-validation and the two sigma values show how much smoothing is applied in each spatial direction. Scott’s rule of thumb allows for a more moderate smoothing, capturing both large-scale trends and some local variations in my data.
bw.ppl (Penalized Profile Likelihood):
Sigma: 15557.15
Explanation: The Penalized Profile Likelihood method balances model fit and smoothness, with a penalty to avoid over-fitting. The lower sigma value (15557.15) means it has more localized smoothing than the Cross-Validation or Scott’s method, meaning it will better highlight smaller clusters in my data while still avoiding over-fitting.
bw.diggle (Diggle’s Method):
Sigma: 191.6059
Explanation: Diggle’s method is particularly good for handling edge effects in spatial data. The very small sigma value of 191.6059 means that this method emphasizes fine-scale local clusters, giving a much more detailed view of the spatial point pattern. In this case, it would allow me to detect even the smallest clusters of conflict events, which could be crucial for my analysis.
3.0.6 What These Results Mean for My Analysis:
bw.CvL gave me a very broad view of the conflict patterns, smoothing over smaller, more detailed clusters. This could be useful for understanding general trends over large areas.
bw.scott provided me with moderate smoothing, balancing between broad and fine trends.
bw.ppl gave a more localized view, which is useful for detecting finer clusters without overfitting.
bw.diggle applied the least smoothing, preserving the smallest details in my data, which could help in identifying specific conflict hotspots.
To perform spatio-temporal analysis, we need to create new ppp objects that include a temporal component. For this case, we’ll use the event_type as the temporal mark.
event_type_mapping <-as.numeric(as.factor(myanmar_sf$event_type))names(event_type_mapping) <-levels(as.factor(myanmar_sf$event_type))# Convert quarterly data frames to ppp objects with event_type as the markfor (df_name inls(pattern ="^myanmar_sf_\\d{4}_Q\\d_")) { df <-get(df_name) df <- df %>%mutate(Event_num = event_type_mapping[as.character(event_type)])# Convert to ppp object with Event_num as the mark ppp_object <-as.ppp(df %>% dplyr::select(Event_num))assign(gsub("sf", "ppp_temp", df_name), ppp_object)}# Check the created ppp objectsls(pattern ="myanmar_ppp_temp")
Now, let’s use sparr::spattemp.density() to compute the spatio-temporal kernel density estimation for each quarterly dataset.
4.0.3 Function to convert ppp objects to Kilometers (km)
# Function to convert ppp objects to kilometersconvert_to_km <-function(ppp_object) {print(unitname(ppp_object)) ppp_km <-rescale.ppp(ppp_object, s =1000, unitname =c("km", "km"))return(ppp_km)}
Apply the conversion function to all myanmar_ppp_temp objects
ppp_temps <-ls(pattern ="myanmar_ppp_temp")for (ppp_name in ppp_temps) { ppp_object <-get(ppp_name)# Convert the ppp object to kilometers ppp_km <-convert_to_km(ppp_object)assign(gsub("temp", "temp_km", ppp_name), ppp_km)}
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
unit / units
You can now use this helper function in your context for each event type (violence_civilians, explosions_remote_violence, strat_devs, battles). Below is how you can set it up for each category with the appropriate ppp_km objects.
combine_ppp_with_owin <-function(ppp_name, ppp_object) {# Extract the relevant part of the name to match the owin object owin_name <-gsub("ppp_temp_km", "owin", ppp_name)# Get the corresponding owin object owin_object <-get(owin_name)# Combine the ppp object with the corresponding owin window combined_ppp <- ppp_object[owin_object]return(combined_ppp)}plot_combined_ppp <-function(combined_ppp, main_title ="Combined PPP Plot") {# Plot the ppp object with titleplot(combined_ppp, main = main_title)}# Helper function to compute the spatial KDE for ppp objectscompute_kde <-function(ppp_object, sigma_value =1000) {# Compute the KDE using a Gaussian kernel kde_result <-density.ppp(ppp_object, sigma = sigma_value, edge =TRUE)return(kde_result)}
4.0.5 Loop Through Each Event Type List and Compute KDE
Next, We will loop through each event type list (b_ppp_list, erv_ppp_list, vac_ppp_list, sd_ppp_list) and compute the KDE layers for each ppp object.
helper function to compute spatio-temporal KDE using the sparr::spattemp.density function, where we can specify the spatial (h) and temporal (lambda) bandwidths.
quarter_time_mapping <-list("2021_Q1"=1,"2021_Q2"=2,"2021_Q3"=3,"2021_Q4"=4,"2022_Q1"=5,"2022_Q2"=6,"2022_Q3"=7,"2022_Q4"=8,"2023_Q1"=9,"2023_Q2"=10,"2023_Q3"=11,"2023_Q4"=12,"2024_Q1"=13,"2024_Q2"=14)assign_temporal_marks <-function(ppp_object, quarter_string) {if (quarter_string %in%names(quarter_time_mapping)) { base_time_mark <- quarter_time_mapping[[quarter_string]] num_points <-npoints(ppp_object) time_variation <-runif(num_points, min =0, max =0.25) # ppp_object$marks <- base_time_mark + time_variation } else {stop("Quarter string not found in mapping") }return(ppp_object)}compute_spatio_temporal_kde <-function(ppp_object, h_value =1000, lambda_value =1) {# Ensure that the temporal marks exist and are numericif (is.numeric(ppp_object$marks)) { time_range <-range(ppp_object$marks)print(time_range[1])print(time_range[2])# Ensure tlim is valid (tlim[1] < tlim[2])if (time_range[1] < time_range[2]) { kde_result <-spattemp.density(ppp_object, h = h_value, lambda = lambda_value, tlim = time_range)return(kde_result) } else {stop("Invalid temporal range: tlim[1] must be < tlim[2]") } } else {stop("Temporal marks are not numeric in the ppp object.") }}
b_ppp_list <-list("myanmar_ppp_temp_km_2021_Q1_b"=get("myanmar_ppp_temp_km_2021_Q1_b"),"myanmar_ppp_temp_km_2021_Q2_b"=get("myanmar_ppp_temp_km_2021_Q2_b"),"myanmar_ppp_temp_km_2021_Q3_b"=get("myanmar_ppp_temp_km_2021_Q3_b"),"myanmar_ppp_temp_km_2021_Q4_b"=get("myanmar_ppp_temp_km_2021_Q4_b"))erv_ppp_list <-list("myanmar_ppp_temp_km_2021_Q1_erv"=get("myanmar_ppp_temp_km_2021_Q1_erv"),"myanmar_ppp_temp_km_2021_Q2_erv"=get("myanmar_ppp_temp_km_2021_Q2_erv"),"myanmar_ppp_temp_km_2021_Q3_erv"=get("myanmar_ppp_temp_km_2021_Q3_erv"),"myanmar_ppp_temp_km_2021_Q4_erv"=get("myanmar_ppp_temp_km_2021_Q4_erv"))sd_ppp_list <-list("myanmar_ppp_temp_km_2021_Q1_sd"=get("myanmar_ppp_temp_km_2021_Q1_sd"),"myanmar_ppp_temp_km_2021_Q2_sd"=get("myanmar_ppp_temp_km_2021_Q2_sd"),"myanmar_ppp_temp_km_2021_Q3_sd"=get("myanmar_ppp_temp_km_2021_Q3_sd"),"myanmar_ppp_temp_km_2021_Q4_sd"=get("myanmar_ppp_temp_km_2021_Q4_sd"))vac_ppp_list <-list("myanmar_ppp_temp_km_2021_Q1_vac"=get("myanmar_ppp_temp_km_2021_Q1_vac"),"myanmar_ppp_temp_km_2021_Q2_vac"=get("myanmar_ppp_temp_km_2021_Q2_vac"),"myanmar_ppp_temp_km_2021_Q3_vac"=get("myanmar_ppp_temp_km_2021_Q3_vac"),"myanmar_ppp_temp_km_2021_Q4_vac"=get("myanmar_ppp_temp_km_2021_Q4_vac"))# Combine them and assign temporal markscombined_b_ppp <-combine_ppp_quarters(b_ppp_list, quarter_labels)
Warning: 70 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_result_b <-spattemp.density(combined_b_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
plot(kde_result_b)
# Combine and compute KDE for Explosions/Remote Violencecombined_erv_ppp <-combine_ppp_quarters(erv_ppp_list, quarter_labels)
Warning: 91 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_result_erv <-spattemp.density(combined_erv_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
plot(kde_result_erv)
# Combine and compute KDE for Strategic Developmentscombined_sd_ppp <-combine_ppp_quarters(sd_ppp_list, quarter_labels)
Warning: 20 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_result_sd <-spattemp.density(combined_sd_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
plot(kde_result_sd)
# Combine and compute KDE for Violence Against Civilianscombined_vac_ppp <-combine_ppp_quarters(vac_ppp_list, quarter_labels)
Warning: 57 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_result_vac <-spattemp.density(combined_vac_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
plot(kde_result_vac)
helper_func_plot_kde_list <-function(kde_list, xlab ="Distance (km)", ylab ="Density") {for (name innames(kde_list)) {plot(kde_list[[name]], main = name, xlab = xlab, ylab = ylab) }} kde_list_battles <-list()kde_list_erv <-list()kde_list_sd <-list()kde_list_vac <-list()# Combine and compute KDE for Battlescombined_b_ppp <-combine_ppp_quarters(b_ppp_list, quarter_labels)
Warning: 70 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_list_battles$Battles <-spattemp.density(combined_b_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
# Combine and compute KDE for Explosions/Remote Violencecombined_erv_ppp <-combine_ppp_quarters(erv_ppp_list, quarter_labels)
Warning: 91 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_list_erv$ExplosionsRemoteViolence <-spattemp.density(combined_erv_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
# Combine and compute KDE for Strategic Developmentscombined_sd_ppp <-combine_ppp_quarters(sd_ppp_list, quarter_labels)
Warning: 20 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_list_sd$StrategicDevelopments <-spattemp.density(combined_sd_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
# Combine and compute KDE for Violence Against Civilianscombined_vac_ppp <-combine_ppp_quarters(vac_ppp_list, quarter_labels)
Warning: 57 points were rejected as lying outside the specified window
Warning: data contain duplicated points
kde_list_vac$ViolenceAgainstCivilians <-spattemp.density(combined_vac_ppp, h =1000, lambda =2)
Calculating trivariate smooth...Done.
Edge-correcting...Done.
Conditioning on time...Done.
# Now use the helper_func_plot_kde_list for each event type# Plot Battles KDEhelper_func_plot_kde_list(kde_list_battles, xlab ="Distance (km)", ylab ="Density")
For deeper insights into spatial clustering, I performed Second-Order Spatial Point Patterns Analysis using the K-function and L-function. These functions measure the spatial interaction between points at various distances, helping me detect whether conflict events are clustered, dispersed, or randomly distributed across space.
K-function: This function computes the expected number of events within a given distance from a random event. If the observed K-function lies above the theoretical K-function, it suggests clustering; if below, it indicates dispersion.
L-function: A linearized version of the K-function, the L-function makes it easier to interpret the results. A plot of the L-function helps visualize whether conflict events tend to cluster or disperse over different scales.
I filtered the conflict data by year to analyze temporal changes in spatial clustering. This helps in understanding how conflict intensity or distribution has evolved over time.
filter_ppp_by_year <-function(ppp_obj, year) {# Extract the year part from marks or events event_years <-marks(ppp_obj)# Filter the ppp object based on the provided year filtered_ppp <- ppp_obj[event_years == year]return(filtered_ppp)}# Helper function for K-function calculation and plottingcompute_k_function <-function(ppp_obj, years) { k_results <-list()for (year in years) { filtered_ppp <-filter_ppp_by_year(ppp_obj, year)if (npoints(filtered_ppp) >0) { # Ensure there's data for that year K_result <-Kest(filtered_ppp, correction ="Ripley")plot(K_result, . - r ~ r, main =paste("K-function for", year), xlim =c(0, 1000)) k_results[[as.character(year)]] <- K_result } }return(k_results)}# Helper function for L-function calculation and plottingcompute_l_function <-function(ppp_obj, years) { l_results <-list()for (year in years) { filtered_ppp <-filter_ppp_by_year(ppp_obj, year)if (npoints(filtered_ppp) >0) { # Ensure there's data for that year L_result <-Lest(filtered_ppp, correction ="Ripley")plot(L_result, . - r ~ r, main =paste("L-function for", year), xlim =c(0, 1000)) l_results[[as.character(year)]] <- L_result } }return(l_results)}
# Plot K-function results for battles by yearhelper_func_plot_kde_list(k_battles_results, xlab ="Distance (km)", ylab ="K(d)-r")# Plot L-function results for strategic developments by yearhelper_func_plot_kde_list(l_sd_results, xlab ="Distance (km)", ylab ="L(d)-r")
# Create a directory to store the .rds files if it doesn't existif (!dir.exists("data/rds")) {dir.create("data/rds", recursive =TRUE)}all_objects <-ls()for (obj_name in all_objects) { obj <-get(obj_name) file_path <-paste0("data/rds/", obj_name, ".rds")saveRDS(obj, file = file_path)}# Verify that the files were savedlist.files("data/rds")
Here are the references for the sources and packages used throughout this project:
Armed Conflict Location & Event Data (ACLED):
ACLED Data for Myanmar Armed Conflict (2021-2024).
URL: https://acleddata.com/
R Packages:
sf: Simple Features for R
Pebesma, E. (2018). “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal, 10(1), 439–446.
URL: https://r-spatial.github.io/sf/
spatstat: Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
Baddeley, A., Rubak, E., Turner, R. (2015). “Spatial Point Patterns: Methodology and Applications with R.” Chapman and Hall/CRC Press.
URL: https://spatstat.org/
sparr: Spatial Relative Risk Functions and Kernel Density Estimation
Davies, T.M., Hazelton, M.L. (2010). “Adaptive kernel estimation of spatial relative risk.” Statistics in Medicine, 29(23), 2423-2437.
URL: https://cran.r-project.org/web/packages/sparr/
tidyverse: A Collection of R Packages for Data Science
Wickham, H., Averick, M., Bryan, J., Chang, W., et al. (2019). “Welcome to the Tidyverse.” Journal of Open Source Software, 4(43), 1686.
URL: https://www.tidyverse.org/
raster: Geographic Data Analysis and Modeling
Hijmans, R.J. (2021). “raster: Geographic Data Analysis and Modeling.” R package version 3.4-10.
URL: https://cran.r-project.org/web/packages/raster/
maptools: Tools for Handling Spatial Objects
Bivand, R., & Lewin-Koh, N. (2021). “maptools: Tools for Handling Spatial Objects.” R package version 1.1-2.
URL: https://cran.r-project.org/web/packages/maptools/
ChatGPT: Assistance in Code Development and Analysis Guidance
OpenAI’s GPT-4-based language model, ChatGPT. Assisted in code optimization, data analysis, and methodology insights for geospatial analytics using R.
URL: https://openai.com/chatgpt